Explore and Summarize Data Project by David Smyth

##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6          :84984   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   C      : 5649   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   D      : 5153   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   B      : 4389   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   AA     : 3509   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   HR     : 3508   Max.   :60.00  
##  (Other)                      :113912   (Other): 6745                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576                      :58848  
##  Completed            :38074   2014-03-04 00:00:00:  105  
##  Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other)              : 1108   (Other)            :54633  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

The Prosper Loan Data dataset contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

Univariate Plots Section

##           A    AA     B     C     D     E    HR    NC 
## 84984  3315  3509  4389  5649  5153  3289  3508   141

The Credit rating that was assigned at the time the listing went live. Applicable for listings pre-2009 period and will only be populated for those listings. The y-axis is presented on a logarithmic scale.

##              Cancelled             Chargedoff              Completed 
##                      5                  11992                  38074 
##                Current              Defaulted FinalPaymentInProgress 
##                  56576                   5018                    205 
##   Past Due (>120 days)   Past Due (1-15 days)  Past Due (16-30 days) 
##                     16                    806                    265 
##  Past Due (31-60 days)  Past Due (61-90 days) Past Due (91-120 days) 
##                    363                    313                    304

The current status of the loan: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue. The PastDue status will be accompanied by a delinquency bucket. The y-axis is presented on a logarithmic scale.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229      25

The Borrower’s Annual Percentage Rate (APR) for the loan. 25 NA’s have been removed in presentation of the data.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

The Borrower’s interest rate for this loan.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   3.000   4.000   4.072   5.000   7.000   29084

The Prosper Rating assigned at the time the listing was created: 0 - N/A, 1 - HR, 2 - E, 3 - D, 4 - C, 5 - B, 6 - A, 7 - AA. Applicable for loans originated after July 2009. 29084 NA’s have been removed in presentation of the data.

##           A    AA     B     C     D     E    HR 
## 29084 14551  5372 15581 18345 14274  9795  6935

The Prosper Rating assigned at the time the listing was created between AA - HR. Applicable for loans originated after July 2009.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    4.00    6.00    5.95    8.00   11.00   29084

A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score. Applicable for loans originated after July 2009. 29084 NA’s have been removed in presentation of the data.

##                                                        Accountant/CPA 
##                               3588                               3233 
##           Administrative Assistant                            Analyst 
##                               3688                               3602 
##                          Architect                           Attorney 
##                                213                               1046 
##                          Biologist                         Bus Driver 
##                                125                                316 
##                         Car Dealer                            Chemist 
##                                180                                145 
##                      Civil Service                             Clergy 
##                               1457                                196 
##                           Clerical                Computer Programmer 
##                               3164                               4478 
##                       Construction                            Dentist 
##                               1790                                 68 
##                             Doctor                Engineer - Chemical 
##                                494                                225 
##              Engineer - Electrical              Engineer - Mechanical 
##                               1125                               1406 
##                          Executive                            Fireman 
##                               4311                                422 
##                   Flight Attendant                       Food Service 
##                                123                               1123 
##            Food Service Management                          Homemaker 
##                               1239                                120 
##                           Investor                              Judge 
##                                214                                 22 
##                            Laborer                        Landscaping 
##                               1595                                236 
##                 Medical Technician                  Military Enlisted 
##                               1117                               1272 
##                   Military Officer                        Nurse (LPN) 
##                                346                                492 
##                         Nurse (RN)                       Nurse's Aide 
##                               2489                                491 
##                              Other                         Pharmacist 
##                              28617                                257 
##         Pilot - Private/Commercial  Police Officer/Correction Officer 
##                                199                               1578 
##                     Postal Service                          Principal 
##                                627                                312 
##                       Professional                          Professor 
##                              13628                                557 
##                       Psychologist                            Realtor 
##                                145                                543 
##                          Religious                  Retail Management 
##                                124                               2602 
##                 Sales - Commission                     Sales - Retail 
##                               3446                               2797 
##                          Scientist                      Skilled Labor 
##                                372                               2746 
##                      Social Worker         Student - College Freshman 
##                                741                                 41 
## Student - College Graduate Student           Student - College Junior 
##                                245                                112 
##           Student - College Senior        Student - College Sophomore 
##                                188                                 69 
##        Student - Community College         Student - Technical School 
##                                 28                                 16 
##                            Teacher                     Teacher's Aide 
##                               3759                                276 
##              Tradesman - Carpenter            Tradesman - Electrician 
##                                120                                477 
##               Tradesman - Mechanic                Tradesman - Plumber 
##                                951                                102 
##                       Truck Driver                    Waiter/Waitress 
##                               1675                                436

The Occupation selected by the Borrower at the time they created the listing. The y-axis is presented on a logarithmic scale.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   26.00   67.00   96.07  137.00  755.00    7625

The length in months of the employment status at the time the listing was created. The x-axis’s upper range has been limited to 600 removing max outlier at 755.00. 7625 NA’s have been removed in presentation of the data.

## False  True 
## 56459 57478

A Borrower will be classified as a homeowner if they have a mortgage on their credit profile or provide documentation confirming they are a homeowner.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591

The lower value representing the range of the borrower’s credit score as provided by a consumer credit rating agency. The x-axis’s upper and lower range has been limited to 450 to 900 removing min outlier at 0.0. 591 NA’s have been removed in presentation of the data.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    19.0   679.0   699.0   704.6   739.0   899.0     591

The upper value representing the range of the borrower’s credit score as provided by a consumer credit rating agency. The x-axis’s upper and lower range has been limited to 450 to 900 removing min outlier at 19.0. 591 NA’s have been removed in presentation of the data.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.0000  0.0000  0.5921  0.0000 83.0000     697

Number of accounts delinquent at the time the credit profile was pulled. The x-axis’s upper range has been limited to 20 removing max outlier at 83.0000. 697 NA’s have been removed in presentation of the data.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##      0.0      0.0      0.0    984.5      0.0 463881.0     7622

Dollars delinquent at the time the credit profile was pulled. The x-axis’s upper range has been limited to 50000 removing max outlier at 463881.0. 7622 NA’s have been removed in presentation of the data.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.310   0.600   0.561   0.840   5.950    7604

The percentage of available revolving credit that is utilized at the time the credit profile was pulled. The x-axis’s upper range has been limited to 2.0 removing max outlier at 5.950. 7604 NA’s have been removed in presentation of the data.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

The debt to income ratio of the borrower at the time the credit profile was pulled. This value is Null if the debt to income ratio is not available. This value is capped at 10.01 (any debt to income ratio larger than 1000% will be returned as 1001%). The x-axis’s upper range has been limited to 1.5 removing max outlier at 10.010. 8554 NA’s have been removed in presentation of the data.

##             $0      $1-24,999      $100,000+ $25,000-49,999 $50,000-74,999 
##            621           7274          17337          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916           7741            806

The income range of the borrower at the time the listing was created.

##  False   True 
##   8669 105268

The borrower indicated they have the required documentation to support their income.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750003

The monthly income the borrower stated at the time the listing was created. The x-axis is presented on a square root scale and has been limited to 50000 removing max outlier at 1750003.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

The origination amount of the loan. The y-axis is presented on a logarithmic scale, the x-axis is presented on a square root scale.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   131.6   217.7   272.5   371.6  2251.5

The scheduled monthly loan payment. The x-axis’s upper range has been limited to 1500 removing max outlier at 2251.5.

Univariate Analysis

What is the structure of your dataset?

This dataset contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

What is/are the main feature(s) of interest in your dataset?

The main feature of interest will be ProsperScore, a score of risk on a scale of 1 to 10; how it is likely determined, and what factors play a hand in influencing it.

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

Features indicating the customer’s ability to take on debt, such as Occupation, CreditScoreRange, DebtToIncomeRatio, etc.

Did you create any new variables from existing variables in the dataset?

No new variables have been created in the investigation of this dataset, 81 are more than enough!

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the
form of the data? If so, why did you do this?

In general all features were investigated utilizing histograms of various bin widths and sizes, some scaling of the x-axis using scale_x_sqrt() and some scaling of the y-axis using scale_y_log10(). x.element_text was used to facilitate the reading of tags and coord_cartesian() and x_lim was used to limit the x-axis. Otherwise, geom_bar() was used to indicate True or False as with IncomeVerifiable or IsBorrowerHomeowner.

Bivariate Plots Section

##                    Employed     Full-time Not available  Not employed 
##          2255         67322         26355          5347           835 
##         Other     Part-time       Retired Self-employed 
##          3806          1088           795          6134

Bivariate boxplot plotting ProsperScore on the y-axis as a function of EmploymentStatus. Curiously, Self-employed garners a lower ProsperScore than Not employed as evidenced by a comparison of each of the medians in the interquartile range of the associated boxplots.

## False  True 
## 56459 57478

Bivariate boxplot plotting ProsperScore as a function of IsBorrowerHomeowner, as evidenced by the nearly identical medians in both the boxplot’s interquartile range, owning a home has no appreciable effect upon one’s ProsperScore.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591

Bivariate boxplot plotting ProsperScore as a function of CreditScoreRangeLower, as evidenced by the median in CreditScoreRangeLower boxplot’s interquartile range, CreditScoreRangeLower is positively correlated with ProsperScore. 591 NA’s have been removed in presentation of the data.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

Bivariate multi-plot of DebtToIncomeRatio as a function of ProsperScore with red point stat_summary indicating the ProsperScore mean. ProsperScore is negatively correlated with DebtToIncomeRatio.

##             $0      $1-24,999      $100,000+ $25,000-49,999 $50,000-74,999 
##            621           7274          17337          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916           7741            806

Bivariate boxplot plotting ProsperScore as a function of IncomeRange, within $25,000 - 74,999 IncomeRange, ProsperScore appears uniformly distributed.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750003

Bivariate boxplot plotting ProsperScore as a function of StatedMonthlyIncome, on a logarithmic scale, approaching 1e+04 StatedMonthlyIncome, ProsperScore appears uniformly distributed.

Bivariate multi-plot of StatedMonthlyIncome as a function of ProsperScore with red point stat_summary indicating the ProsperScore mean. ProsperScore is uniformly distributed approaching 1e+04 on a logarithmic scale.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   131.6   217.7   272.5   371.6  2251.5

Bivariate boxplot plotting MonthlyLoanPayment as a function of ProsperScore, ProsperScore appears slightly positively correlated below 500.

Bivariate multi-plot of MonthlyLoanPayment as a function of ProsperScore with red point stat_summary indicating the ProsperScore mean. ProsperScore appears slightly positively correlated below 500.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

Full-time EmploymentStatus seems to garner a greater ProsperScore than Not employed yet Not employed garners a greater ProsperScore than Self-employed. IsBorrowerHomeowner seems to have little bearing on ProsperScore which is uniformly distributed between True and False. CreditScoreRangeLower, especially above 800, is positively correlated with ProsperScore. A low DebtToIncomeRatio is initially negatively correlated to ProsperScore but then vacillates it rises above 2.5. IncomeRange is positively correlated with ProsperScore above 50k. In general, StatedMonthlyIncome seems uniformly distributed but with outliers of greater than 1,500,000 having lower scores than others of StatedMonthlyIncome 250k or much less. MonthlyLoanPayment is slightly positively correlated.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

The most interesting relationships observed were that Not employed seems to garner a greater ProsperScore than Self-employed, which seems counter-intuitive, and also that DebtToIncomeRatio fluctuates in correlation to ProsperScore as it rises above 2.5.

What was the strongest relationship you found?

IsBorrowerHomeowner seems definitely to have no bearing whatsoever on ProsperScore.

Multivariate Plots Section

Multivariate plot of ProsperScore as a function of IsBorrowerHomeowner for Computer Programmer occupation, as evidenced by nearly identical shading across almost all color bars, IsBorrowerHomeowner appears to have no correlation to ProsperScore.

Multivariate plot of ProsperScore as a function of IsBorrowerHomeowner for Doctor occupation, as evidenced by nearly identical shading across almost all color bars, IsBorrowerHomeowner appears to have no correlation to ProsperScore.

Multivariate plot of ProsperScore as a function of IsBorrowerHomeowner for Tradesman - Plumber occupation, as evidenced by nearly identical shading across almost all color bars, Tradesman - Plumber appears to have no correlation to ProsperScore.

Multivariate plot of CreditScoreRangeLower as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, CreditScoreRangeLower appears slightly positively correlated with ProsperScore for all three occupations.

Multivariate plot of CreditScoreRangeUpper as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, CreditScoreRangeUpper appears slightly positively correlated with ProsperScore for all three occupations.

Multivariate plot of DebtToIncomeRatio as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, DebtToIncomeRatio appears somewhat negatively correlated with ProsperScore for all three occupations with many outliers below the given median.

Multivariate plot of StatedMonthlyIncome as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, StatedMonthlyIncome appears slightly positively correlated with ProsperScore for all three occupations with a few outliers above the given median.

Multivariate plot of MonthlyLoanPayment as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, MonthlyLoanPayment appears somewhat positively correlated with ProsperScore for all three occupations with some outliers above the given median.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

IsBorrowerHomeowner appears evenly divided between all three occupational classes.

For all three occupations, ProsperScore seems to roughly correspond to CreditScore based on the CreditScoreRangeUpper and CreditScoreRangeLower plots.

DebtToIncomeRatio appears somewhat negatively correlated with ProsperScore for all three occupations.

StatedMonthlyIncome appears slightly positively correlated with ProsperScore for all three occupations.

MonthlyLoanPayment appears somewhat positively correlated with ProsperScore for all three occupations.

Were there any interesting or surprising interactions between features?

That IsBorrowerHomeowner seems evenly divided between all occupational classes is somewhat surprising as one would expect a greater percentage of Doctors for instance to own a home or that Doctor’s with a previous home not to need a loan. Perhaps for a second home?


Final Plots and Summary

ProsperScore as a function of EmploymentStatus

Bivariate boxplot plotting ProsperScore on the y-axis as a function of EmploymentStatus. Curiously, Self-employed garners a lower ProsperScore than Not employed as evidenced by the lesser median in the interquartile range of the Self-employed boxplot in comparison with the greater median in the interquartile range of the Not employed boxplot.

ProsperScore as a function of IsBorrowerHomeowner

Bivariate boxplot plotting ProsperScore as a function of IsBorrowerHomeowner, as evidenced by the nearly identical medians in both the boxplot’s interquartile range, owning a home has no appreciable effect upon one’s ProsperScore.

ProsperScore as a function of DebtToIncomeRatio and Occupation

Multivariate plot of DebtToIncomeRatio as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, DebtToIncomeRatio appears somewhat negatively correlated with ProsperScore for all three occupations with many outliers below the given median.


Reflection

The Prosper Loan Data is large and incorporates many variables which makes it difficult to decide what to focus on but after much study and reflection, ProsperScore presented itself as deserving a more through examination… What is ProsperScore, and if, presumably loans are to be granted based upon it, given it’s proprietary nature, how is it determined.

Exploring the Prosper Loan Data in regards to ProsperScore met with a few difficulties due to the sprawling number of variables and their size, for instance, plotting the entire number of Occupations listed, surpassed ggplot’s ability to display and I was forced to settle upon three Occupations to grossly represent, the blue collar, white collar and professional realms so as to more simply and clearly delineate possible relationships.

Ultimately, it seems that ProsperScore is a complex calculation that cannot be easily pinned down via an analysis of three variables as no doubt each of the Prosper dataset’s 81 variables play a part in a much more complex determination of a customer’s final Score. This conclusion is bolstered by the somewhat counter-intuitive findings garnered by the dataset’s exploration and analysis in R: for instance, that someone Self-employed would receive a lower ProsperScore than someone Not employed, or that IsBorrowerHomeowner being True receives no preference in ProsperScore, or as an outlier, a Doctor’s ProsperScore can rise the higher his DebtToIncomeRatio climbs. How these variables are offset by other heretofore unexamined variables can be a focus of future work within the dataset.